Skip to content

fix(ai, ai-anthropic): thinking blocks missing on turn 2+ in tool loops#391

Open
imsherrill wants to merge 4 commits intoTanStack:mainfrom
imsherrill:fix/thinking-blocks-per-step
Open

fix(ai, ai-anthropic): thinking blocks missing on turn 2+ in tool loops#391
imsherrill wants to merge 4 commits intoTanStack:mainfrom
imsherrill:fix/thinking-blocks-per-step

Conversation

@imsherrill
Copy link
Copy Markdown
Contributor

@imsherrill imsherrill commented Mar 20, 2026

Summary

  • Fixes thinking blocks being merged into a single ThinkingPart per message instead of one per step
  • Preserves thinking content and Anthropic signatures in server-side message history for multi-turn context
  • Adds interleaved-thinking-2025-05-14 beta header when thinking is enabled — required by Anthropic API for thinking on tool-result follow-up turns
  • Each thinking step now tracked by stepId through the full stack: adapter → engine → processor → UIMessage

Changes

@tanstack/ai:

  • ThinkingPart gains optional stepId and signature fields
  • ModelMessage gains optional thinking array for multi-turn context
  • StepFinishedEvent gains optional signature field
  • StreamProcessor handles STEP_STARTED, tracks thinking per-step via Map<stepId, content>
  • TextEngine accumulates thinking + signatures per iteration, includes them in assistant messages
  • buildAssistantMessages preserves ThinkingParts into ModelMessage.thinking
  • updateThinkingPart keys on stepId instead of replacing a single part
  • onThinkingUpdate callback gains stepId parameter

@tanstack/ai-anthropic:

  • Captures signature_delta stream events for thinking block signatures
  • Emits final STEP_FINISHED with signature on content_block_stop
  • Includes thinking blocks with signatures in formatMessages for multi-turn history
  • Passes betas: ['interleaved-thinking-2025-05-14'] when thinking is enabled

@tanstack/ai-client:

  • onThinkingUpdate callback updated for new stepId parameter

Test plan

  • 634 tests pass in @tanstack/ai (4 new tests for multi-step thinking)
  • 198 tests pass in @tanstack/ai-client
  • Types and eslint clean
  • Manual test: Anthropic Sonnet 4.5 — thinking on all 4 turns of multi-step tool flow
  • Manual test: OpenAI o4-mini — reasoning on multiple turns
  • Manual test: client tools with approval — no infinite loop, completes normally

Closes #340

Summary by CodeRabbit

  • New Features

    • Multi-step “thinking” is now supported: assistant messages can include multiple ordered thinking blocks (with per-step content and signatures) that are replayed in subsequent turns.
    • Streaming improves step-aware thinking delivery so intermediate thinking updates appear per step.
  • Tests

    • Added end-to-end and unit tests validating multi-step thinking, signatures, and UI rendering.
  • Chores

    • Changeset documenting thinking-block behavior and replay.

- Track thinking per-step via stepId instead of merging into single ThinkingPart
- Capture Anthropic signature_delta and preserve through the full stack
- Server-side TextEngine accumulates thinking + signatures per iteration
- Include thinking blocks in Anthropic message history for multi-turn context
- Add interleaved-thinking-2025-05-14 beta header when thinking is enabled
- Add tests for multi-step thinking, backward compat, and result aggregation

Closes TanStack#340
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Adds multi-step “thinking” support: streaming captures per-step thinking (content + optional signature), tracks step ordering and IDs, replays thinking into subsequent assistant messages, and enables Anthropic interleaved-thinking beta when a thinking budget is configured.

Changes

Cohort / File(s) Summary
Anthropic Adapter
packages/typescript/ai-anthropic/src/adapters/text.ts
Conditionally enables interleaved-thinking-2025-05-14 when modelOptions.thinking with budget is present; formats thinking content blocks with signatures; captures signature_delta and emits signature on thinking completion.
Stream Processor Core
packages/typescript/ai/src/activities/chat/stream/processor.ts, packages/typescript/ai/src/activities/chat/stream/types.ts
Replaces single aggregate thinking state with per-step tracking: thinkingSteps, thinkingStepSignatures, thinkingStepOrder, currentThinkingStepId; adds STEP_STARTED handling; changes onThinkingUpdate callback to (messageId, stepId, content).
Message Updaters
packages/typescript/ai/src/activities/chat/stream/message-updaters.ts
updateThinkingPart now requires stepId, matches/creates ThinkingPart by stepId, and stores optional signature on parts.
Chat Engine & Message Construction
packages/typescript/ai/src/activities/chat/index.ts, packages/typescript/ai/src/activities/chat/messages.ts
Accumulates thinking per iteration (accumulatedThinking, current step state), finalizes and attaches thinking array (content + optional signature) to assistant messages when emitting tool-calls or assistant segments.
Types
packages/typescript/ai/src/types.ts
ModelMessage gains optional thinking?: Array<{ content: string; signature?: string }>; ThinkingPart gets optional stepId?: string and signature?: string; StepFinishedEvent accepts optional signature.
Client Wiring
packages/typescript/ai-client/src/chat-client.ts
Updates StreamProcessor thinking event handler to accept the new (messageId, stepId, content) arity (adds a ignored _stepId param at call sites).
Stream Message Building & Tests
packages/typescript/ai/src/activities/chat/stream/message-updaters.ts, packages/typescript/ai/tests/message-updaters.test.ts, packages/typescript/ai/tests/stream-processor.test.ts
Tests and builders updated for step-aware thinking: new test helpers for STEP_STARTED, assertions for per-stepId thinking parts and signature handling; updateThinkingPart call sites updated to pass stepId.
Smoke Tests / Mocks / E2E
packages/typescript/smoke-tests/e2e/src/routes/api.mock-chat.ts, packages/typescript/smoke-tests/e2e/src/routes/mock.tsx, packages/typescript/smoke-tests/e2e/tests/thinking.spec.ts
Adds thinking-multi-step mock scenario and E2E tests validating two distinct thinking parts (step IDs, content, signatures), and surfaces thinking metrics in mock UI attributes.
Changeset
.changeset/thinking-blocks-per-step.md
Documents changes: types, StreamProcessor signature change, Anthropic beta usage, and TextEngine thinking replay behavior.

Sequence Diagram

sequenceDiagram
    participant Client
    participant AnthropicAPI as Anthropic API
    participant StreamProcessor
    participant TextEngine
    participant MessageBuilder as Messages

    Client->>AnthropicAPI: Request (with thinkingBudget + betas)
    AnthropicAPI-->>StreamProcessor: STEP_STARTED (stepId)
    StreamProcessor->>StreamProcessor: record pending/current stepId

    AnthropicAPI-->>StreamProcessor: signature_delta (chunks)
    StreamProcessor->>StreamProcessor: accumulate signature

    AnthropicAPI-->>StreamProcessor: STEP_FINISHED (stepId, delta/content)
    StreamProcessor->>StreamProcessor: append thinkingSteps[stepId], store signature
    StreamProcessor->>MessageBuilder: updateThinkingPart(msgId, stepId, content, signature)
    StreamProcessor->>TextEngine: emit step events

    AnthropicAPI-->>StreamProcessor: text/tool chunks
    StreamProcessor->>TextEngine: process chunks, persist thinking into message
    TextEngine->>MessageBuilder: build assistant message including thinking array
    MessageBuilder-->>Client: deliver assistant message (content + thinking[] with signatures)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

  • jherr

Poem

🐰 I hopped through streams both wide and deep,
I gathered thoughts in little heaps;
Each step I traced with ribboned sign,
So thinking comes back, turn after time. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix(ai, ai-anthropic): thinking blocks missing on turn 2+ in tool loops' clearly and specifically describes the main issue being fixed—thinking blocks not appearing after the first turn in tool loops.
Description check ✅ Passed The PR description comprehensively covers the changes, provides detailed breakdowns per package, and includes a complete test plan with manual verification steps.
Linked Issues check ✅ Passed The PR fully addresses issue #340 by implementing multi-step thinking tracking via stepId, preserving thinking content across turns, adding the Anthropic beta header, and including comprehensive tests.
Out of Scope Changes check ✅ Passed All code changes are directly scoped to fixing the thinking blocks issue in tool loops; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/thinking-blocks-per-step

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Mar 20, 2026

View your CI Pipeline Execution ↗ for commit c042c55

Command Status Duration Result
nx affected --targets=test:sherif,test:knip,tes... ✅ Succeeded 3m 59s View ↗
nx run-many --targets=build --exclude=examples/** ✅ Succeeded 1m 28s View ↗

☁️ Nx Cloud last updated this comment at 2026-03-20 21:44:56 UTC

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Mar 20, 2026

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@391

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@391

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@391

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@391

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@391

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@391

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@391

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@391

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@391

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@391

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@391

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@391

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@391

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@391

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@391

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@391

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@391

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@391

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@391

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@391

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@391

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@391

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@391

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@391

commit: c042c55

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/typescript/ai/src/activities/chat/messages.ts (1)

181-186: ⚠️ Potential issue | 🟠 Major

Round-tripping through ModelMessage still strips thinking.

Line 185 starts serializing thinking into assistant ModelMessages, but modelMessageToUIMessage() below never materializes modelMessage.thinking back into ThinkingParts. Any history hydrated through normalizeToUIMessage() / ChatClient.append() will silently drop the thinking blocks and signatures, so a later convertMessagesToModelMessages() sends stripped context again.

Patch sketch
 export function modelMessageToUIMessage(
   modelMessage: ModelMessage,
   id?: string,
 ): UIMessage {
   const parts: Array<MessagePart> = []
+
+  if (modelMessage.role === 'assistant' && modelMessage.thinking?.length) {
+    for (const thinkingPart of modelMessage.thinking) {
+      parts.push({
+        type: 'thinking',
+        content: thinkingPart.content,
+        ...(thinkingPart.signature && {
+          signature: thinkingPart.signature,
+        }),
+      })
+    }
+  }
 
   // Handle tool results (when role is "tool") - only produce tool-result part,
   // not a text part (the content IS the tool result, not display text)
   if (modelMessage.role === 'tool' && modelMessage.toolCallId) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/chat/messages.ts` around lines 181 -
186, The assistant message builder adds a thinking field to ModelMessage (see
the push that adds thinking), but
modelMessageToUIMessage()/normalizeToUIMessage()/ChatClient.append() never
rehydrate modelMessage.thinking back into ThinkingPart objects and signatures,
causing round-trip loss; update modelMessageToUIMessage (and any
normalizeToUIMessage/ChatClient.append code paths) to detect
modelMessage.thinking and reconstruct the original ThinkingPart shape (including
reasoning text and signature fields) so that convertMessagesToModelMessages
receives intact thinking blocks on subsequent conversions.
🧹 Nitpick comments (3)
packages/typescript/ai/tests/stream-processor.test.ts (1)

778-830: Add one signature propagation regression here.

These new cases cover stepId, but none assert that a STEP_FINISHED.signature survives into the stored thinking part. That field is what Anthropic needs on the follow-up turn, so this branch is still unguarded by tests.

Test sketch
+    it('should persist signature on a thinking step', () => {
+      const processor = new StreamProcessor()
+      processor.prepareAssistantMessage()
+
+      processor.processChunk(
+        chunk('STEP_FINISHED', {
+          stepId: 'step-1',
+          delta: 'thinking...',
+          signature: 'sig-1',
+        }),
+      )
+
+      expect(processor.getMessages()[0]?.parts[0]).toEqual({
+        type: 'thinking',
+        content: 'thinking...',
+        stepId: 'step-1',
+        signature: 'sig-1',
+      })
+    })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/tests/stream-processor.test.ts` around lines 778 -
830, Add assertions to the tests that verify STEP_FINISHED.signature is
propagated into the stored thinking parts and into the aggregated
state.thinking: when using StreamProcessor.processChunk with
ev.stepFinished(..., stepId) include a signature value and assert that (from
processor.getMessages()[0]!.parts filtered by type 'thinking') each thinking
part has the same signature (e.g., for 'step-1' and 'step-2'), and in the
getResult()/getState() concatenation case assert that state.thinking retains
each step's signature information (or that the per-step signatures are
accessible on the stored thinking parts); locate code via
StreamProcessor.processChunk, ev.stepFinished, processor.getMessages, and
processor.getState to add these checks.
packages/typescript/ai-anthropic/src/adapters/text.ts (2)

395-405: Thinking blocks without signatures are silently filtered out.

The loop only pushes thinking blocks that have a signature. This is correct per Anthropic's API requirements (signatures are mandatory for replaying thinking in multi-turn context), but a brief comment explaining this would help maintainability.

📝 Suggested comment
 if (message.thinking?.length) {
   for (const thinking of message.thinking) {
+    // Anthropic requires signatures to replay thinking blocks in multi-turn context;
+    // blocks without signatures cannot be included.
     if (thinking.signature) {
       contentBlocks.push({
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-anthropic/src/adapters/text.ts` around lines 395 -
405, Add a short explanatory comment above the loop that filters thinking blocks
to clarify that only thinking entries with a signature are included because
Anthropic requires signatures to replay thinking in multi-turn context;
reference the variables message.thinking, thinking.signature, the
contentBlocks.push of AnthropicContentBlock, and note that omitting unsigned
thinking blocks is intentional for API compliance and not a bug.

292-295: Add a comment explaining the beta header requirement and type assertion.

The beta header interleaved-thinking-2025-05-14 is required to enable interleaved thinking in Claude models during tool-use conversations. Add a brief comment explaining why this beta is needed and why the as any cast is used (the Anthropic SDK types don't include betas in this parameter definition).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-anthropic/src/adapters/text.ts` around lines 292 -
295, Add an inline comment above the conditional that sets betas explaining that
the beta header "interleaved-thinking-2025-05-14" is required to enable
interleaved thinking for Claude during tool-use conversations, and note that the
`as any` cast is used because the Anthropic SDK types do not include `betas` on
this parameter; update the block around the `thinkingBudget` conditional and the
`betas` property to include that short explanatory comment next to `betas` and
`as any` so future readers understand both the runtime requirement and the
type-workaround.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai/src/activities/chat/messages.ts`:
- Around line 168-169: The issue is that pendingThinking is buffered separately
from current.toolCalls, which loses the original interleaving (e.g., thinking,
tool-call, thinking) when constructing ModelMessage; change the buffering to a
single ordered stream of typed entries (e.g., an array of {type:
'thinking'|'toolCall', payload: ...}) instead of separate pendingThinking and
current.toolCalls so you can preserve insert order when emitting/serializing;
update all places that push to pendingThinking and to current.toolCalls (and the
duplicate buffering logic referenced around the 233-239 region) to push a typed
entry into the unified buffer and update the code that builds the assistant
ModelMessage to iterate this unified buffer and emit thinking/toolCall entries
in original sequence.

In `@packages/typescript/ai/src/activities/chat/stream/processor.ts`:
- Line 146: The field pendingThinkingStepId is not cleared in
resetStreamState(), causing stale step IDs to persist across streams; update the
resetStreamState() method to set this.pendingThinkingStepId = null so that
prepareAssistantMessage() starting a new stream cannot inherit an old id, and
ensure any other stream-reset paths also call resetStreamState() (or explicitly
clear pendingThinkingStepId) to avoid leaking the previous STEP_STARTED
association.

---

Outside diff comments:
In `@packages/typescript/ai/src/activities/chat/messages.ts`:
- Around line 181-186: The assistant message builder adds a thinking field to
ModelMessage (see the push that adds thinking), but
modelMessageToUIMessage()/normalizeToUIMessage()/ChatClient.append() never
rehydrate modelMessage.thinking back into ThinkingPart objects and signatures,
causing round-trip loss; update modelMessageToUIMessage (and any
normalizeToUIMessage/ChatClient.append code paths) to detect
modelMessage.thinking and reconstruct the original ThinkingPart shape (including
reasoning text and signature fields) so that convertMessagesToModelMessages
receives intact thinking blocks on subsequent conversions.

---

Nitpick comments:
In `@packages/typescript/ai-anthropic/src/adapters/text.ts`:
- Around line 395-405: Add a short explanatory comment above the loop that
filters thinking blocks to clarify that only thinking entries with a signature
are included because Anthropic requires signatures to replay thinking in
multi-turn context; reference the variables message.thinking,
thinking.signature, the contentBlocks.push of AnthropicContentBlock, and note
that omitting unsigned thinking blocks is intentional for API compliance and not
a bug.
- Around line 292-295: Add an inline comment above the conditional that sets
betas explaining that the beta header "interleaved-thinking-2025-05-14" is
required to enable interleaved thinking for Claude during tool-use
conversations, and note that the `as any` cast is used because the Anthropic SDK
types do not include `betas` on this parameter; update the block around the
`thinkingBudget` conditional and the `betas` property to include that short
explanatory comment next to `betas` and `as any` so future readers understand
both the runtime requirement and the type-workaround.

In `@packages/typescript/ai/tests/stream-processor.test.ts`:
- Around line 778-830: Add assertions to the tests that verify
STEP_FINISHED.signature is propagated into the stored thinking parts and into
the aggregated state.thinking: when using StreamProcessor.processChunk with
ev.stepFinished(..., stepId) include a signature value and assert that (from
processor.getMessages()[0]!.parts filtered by type 'thinking') each thinking
part has the same signature (e.g., for 'step-1' and 'step-2'), and in the
getResult()/getState() concatenation case assert that state.thinking retains
each step's signature information (or that the per-step signatures are
accessible on the stored thinking parts); locate code via
StreamProcessor.processChunk, ev.stepFinished, processor.getMessages, and
processor.getState to add these checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 19b0cdc6-d9a2-4d83-bb64-7a1aef5b7b29

📥 Commits

Reviewing files that changed from the base of the PR and between c3583e3 and c042c55.

📒 Files selected for processing (10)
  • packages/typescript/ai-anthropic/src/adapters/text.ts
  • packages/typescript/ai-client/src/chat-client.ts
  • packages/typescript/ai/src/activities/chat/index.ts
  • packages/typescript/ai/src/activities/chat/messages.ts
  • packages/typescript/ai/src/activities/chat/stream/message-updaters.ts
  • packages/typescript/ai/src/activities/chat/stream/processor.ts
  • packages/typescript/ai/src/activities/chat/stream/types.ts
  • packages/typescript/ai/src/types.ts
  • packages/typescript/ai/tests/message-updaters.test.ts
  • packages/typescript/ai/tests/stream-processor.test.ts

Comment on lines +168 to 169
let pendingThinking: Array<{ content: string; signature?: string }> = []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This still loses thinking/tool-call order within a single turn.

pendingThinking is buffered independently from current.toolCalls, so a sequence like [thinking(step-1), tool-call(a), thinking(step-2), tool-call(b)] gets flattened into one assistant ModelMessage with thinking: [step-1, step-2] and toolCalls: [a, b]. If a provider emits multiple thinking blocks around multiple tool calls in one turn, the original interleaving is gone and the next request cannot faithfully replay that history.

Also applies to: 233-239

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/chat/messages.ts` around lines 168 -
169, The issue is that pendingThinking is buffered separately from
current.toolCalls, which loses the original interleaving (e.g., thinking,
tool-call, thinking) when constructing ModelMessage; change the buffering to a
single ordered stream of typed entries (e.g., an array of {type:
'thinking'|'toolCall', payload: ...}) instead of separate pendingThinking and
current.toolCalls so you can preserve insert order when emitting/serializing;
update all places that push to pendingThinking and to current.toolCalls (and the
duplicate buffering logic referenced around the 233-239 region) to push a typed
entry into the unified buffer and update the code that builds the assistant
ModelMessage to iterate this unified buffer and emit thinking/toolCall entries
in original sequence.

Comment thread packages/typescript/ai/src/activities/chat/stream/processor.ts
…int; clear stale pendingThinkingStepId

- Move `betas: ['interleaved-thinking-2025-05-14']` out of the shared mapper
  and onto the `beta.messages.create` call site in chatStream. Prevents the
  non-beta structuredOutput endpoint from receiving an invalid `betas` field
  when a thinking budget is configured.
- Clear `pendingThinkingStepId` when a later STEP_STARTED takes the
  active-message branch, and also in `resetStreamState`, so a stale pending
  id can't misattribute a later STEP_FINISHED's delta to an earlier step.
- Add covering test for the pendingThinkingStepId leak (red-green verified).
- Add `thinking-multi-step` mock scenario to the e2e harness emitting
  STEP_STARTED/STEP_FINISHED pairs for two distinct stepIds with
  provider signatures, followed by a text message and RUN_FINISHED.
- Expose thinkingPartCount / thinkingStepIds on the mock chat page via
  data-* attributes for assertion.
- Add tests/thinking.spec.ts asserting two ThinkingParts with distinct
  stepIds and matching signatures are produced (pre-PR behavior merged
  them into a single part).
- Add .changeset/thinking-blocks-per-step.md bumping @tanstack/ai,
  @tanstack/ai-anthropic, and @tanstack/ai-client.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
packages/typescript/smoke-tests/e2e/tests/thinking.spec.ts (2)

48-51: Tighten types on getMessages and filter callbacks.

The Array<any> return type and the (p: any) predicates throughout the tests disable type-checking on the assertions (e.g. assistantMessage.parts, firstStep.content, firstStep.signature). Since UIMessage is already exported from @tanstack/ai-client and used in mock.tsx, importing it here would catch shape regressions (e.g. if ThinkingPart.signature is renamed) at compile time rather than flakily at runtime.

Suggested change
-import { test, expect, type Page } from '@playwright/test'
+import { expect, test } from '@playwright/test'
+import type { Page } from '@playwright/test'
+import type { UIMessage } from '@tanstack/ai-client'
@@
-async function getMessages(page: Page): Promise<Array<any>> {
+async function getMessages(page: Page): Promise<Array<UIMessage>> {
   const jsonContent = await page.locator('#messages-json-content').textContent()
   return JSON.parse(jsonContent || '[]')
 }

This also addresses the ESLint sort-imports and import/consistent-type-specifier-style hints on line 1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/smoke-tests/e2e/tests/thinking.spec.ts` around lines 48 -
51, Tighten the types by importing UIMessage from "@tanstack/ai-client" and
update getMessages to return Promise<UIMessage[]> (change the function signature
and JSON parse typing), then replace loose any usages in filter/map callbacks
(e.g., predicates like (p: any)) with the concrete UIMessage type and update
assertions to use the typed properties (e.g., assistantMessage.parts,
firstStep.content, firstStep.signature) so TypeScript catches shape/regression
changes; ensure imports follow the project's import style to address
sort-imports and import/consistent-type-specifier-style hints.

115-122: afterEach is outside the describe block.

The test.afterEach hook at line 115 is declared after the describe closes on line 113, so it attaches to the root file scope rather than the Chat E2E Tests - Multi-Step Thinking (Mock API) group. That's functionally fine today (only one describe block in this file), but if additional describes are added later the screenshot hook will run for all of them, which may or may not be intended. Consider moving it inside the describe for clearer scoping.

Suggested change
 test.describe('Chat E2E Tests - Multi-Step Thinking (Mock API)', () => {
   ...
+
+  test.afterEach(async ({ page }, testInfo) => {
+    if (testInfo.status !== testInfo.expectedStatus) {
+      await page.screenshot({
+        path: `test-results/failure-${testInfo.title.replace(/\s+/g, '-')}.png`,
+        fullPage: true,
+      })
+    }
+  })
 })
-
-test.afterEach(async ({ page }, testInfo) => {
-  if (testInfo.status !== testInfo.expectedStatus) {
-    await page.screenshot({
-      path: `test-results/failure-${testInfo.title.replace(/\s+/g, '-')}.png`,
-      fullPage: true,
-    })
-  }
-})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/smoke-tests/e2e/tests/thinking.spec.ts` around lines 115
- 122, The screenshot cleanup hook test.afterEach is currently declared outside
the describe "Chat E2E Tests - Multi-Step Thinking (Mock API)" so it attaches to
the root scope; move the test.afterEach block inside that describe (before its
closing brace) so the hook is scoped to that describe only, preserving the same
screenshot logic and parameters (page, testInfo) and keeping the filename
generation code unchanged.
packages/typescript/smoke-tests/e2e/src/routes/api.mock-chat.ts (1)

310-394: Two STEP_FINISHED events per step — intentional but worth a comment.

Each thinking step emits STEP_FINISHED twice (first with a delta, then with content + signature). This appears to intentionally simulate streaming deltas followed by a final summary event, but STEP_FINISHED conventionally fires once per step. Since the PR's tests pass and rely on both events being processed (final signature stored on the step), consider adding a brief inline comment documenting that the first STEP_FINISHED carries the incremental delta and the second carries the final content/signature, so future readers don't mistake it for a duplicate event bug.

Also: the Date.now() calls in the chunk literals are evaluated at module load time; createMockStream already overrides timestamp on yield (line 407), so these initial values are dead — not a defect, just noise. Consider using a placeholder like timestamp: 0 for the static chunks to make that clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/smoke-tests/e2e/src/routes/api.mock-chat.ts` around lines
310 - 394, Add a short inline comment above the "thinking-multi-step" mock
describing that each thinking step intentionally emits two STEP_FINISHED events
(first carrying the streaming delta, second carrying final content+signature) so
readers don't treat it as a duplicate; also replace the Date.now() timestamps in
the chunk literals with a placeholder like 0 since createMockStream overrides
timestamps at yield (see "thinking-multi-step", STEP_FINISHED entries and
createMockStream).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-anthropic/src/adapters/text.ts`:
- Around line 407-417: The code drops signed thinking for assistant messages
that have no toolCalls; ensure any message.thinking items with a signature are
always appended to contentBlocks as AnthropicContentBlock regardless of the
assistant/toolCalls branch. Update the logic around the handling of
message.thinking (the loop that pushes {type:'thinking', thinking:
thinking.content, signature: thinking.signature} into contentBlocks) so it runs
for both the assistant+toolCalls branch and the later assistant-only branch (the
code paths around the existing message.thinking loop and the block at lines
~468-478); you can either hoist the loop above the branch so it always executes
for message.thinking or add the same signed-thinking push into the
assistant-only branch, referencing message.thinking, contentBlocks, and
AnthropicContentBlock.

In `@packages/typescript/ai/src/activities/chat/stream/processor.ts`:
- Around line 1119-1136: The code currently prefers state.currentThinkingStepId
over chunk.stepId which causes distinct chunk.stepId values to be merged into an
existing thinking step; change the selection to prefer the explicit chunk.stepId
when present: compute stepId = chunk.stepId ?? state.currentThinkingStepId, then
if the resolved stepId is not present in state.thinkingSteps, initialize it
(state.thinkingSteps.set(stepId, ''), state.thinkingStepOrder.push(stepId)) and
update state.currentThinkingStepId = stepId; keep the existing
pendingThinkingStepId consumption logic but ensure it does not override an
explicit chunk.stepId.

---

Nitpick comments:
In `@packages/typescript/smoke-tests/e2e/src/routes/api.mock-chat.ts`:
- Around line 310-394: Add a short inline comment above the
"thinking-multi-step" mock describing that each thinking step intentionally
emits two STEP_FINISHED events (first carrying the streaming delta, second
carrying final content+signature) so readers don't treat it as a duplicate; also
replace the Date.now() timestamps in the chunk literals with a placeholder like
0 since createMockStream overrides timestamps at yield (see
"thinking-multi-step", STEP_FINISHED entries and createMockStream).

In `@packages/typescript/smoke-tests/e2e/tests/thinking.spec.ts`:
- Around line 48-51: Tighten the types by importing UIMessage from
"@tanstack/ai-client" and update getMessages to return Promise<UIMessage[]>
(change the function signature and JSON parse typing), then replace loose any
usages in filter/map callbacks (e.g., predicates like (p: any)) with the
concrete UIMessage type and update assertions to use the typed properties (e.g.,
assistantMessage.parts, firstStep.content, firstStep.signature) so TypeScript
catches shape/regression changes; ensure imports follow the project's import
style to address sort-imports and import/consistent-type-specifier-style hints.
- Around line 115-122: The screenshot cleanup hook test.afterEach is currently
declared outside the describe "Chat E2E Tests - Multi-Step Thinking (Mock API)"
so it attaches to the root scope; move the test.afterEach block inside that
describe (before its closing brace) so the hook is scoped to that describe only,
preserving the same screenshot logic and parameters (page, testInfo) and keeping
the filename generation code unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 52fe2bf7-eb5c-4660-a208-3e1292bede10

📥 Commits

Reviewing files that changed from the base of the PR and between c042c55 and c31d45d.

📒 Files selected for processing (7)
  • .changeset/thinking-blocks-per-step.md
  • packages/typescript/ai-anthropic/src/adapters/text.ts
  • packages/typescript/ai/src/activities/chat/stream/processor.ts
  • packages/typescript/ai/tests/stream-processor.test.ts
  • packages/typescript/smoke-tests/e2e/src/routes/api.mock-chat.ts
  • packages/typescript/smoke-tests/e2e/src/routes/mock.tsx
  • packages/typescript/smoke-tests/e2e/tests/thinking.spec.ts
✅ Files skipped from review due to trivial changes (1)
  • .changeset/thinking-blocks-per-step.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/typescript/ai/tests/stream-processor.test.ts

Comment on lines +407 to +417
if (message.thinking?.length) {
for (const thinking of message.thinking) {
if (thinking.signature) {
contentBlocks.push({
type: 'thinking',
thinking: thinking.content,
signature: thinking.signature,
} as unknown as AnthropicContentBlock)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replay signed thinking for non-tool assistant messages too.

Line 407 only preserves thinking inside the assistant && toolCalls branch. Assistant responses that contain signed thinking but no tool calls fall through to lines 468-478, so their message.thinking is dropped from future Anthropic context.

🐛 Proposed fix
+    const appendThinkingBlocks = (
+      contentBlocks: AnthropicContentBlocks,
+      thinkingParts: ModelMessage['thinking'],
+    ) => {
+      if (!thinkingParts?.length) return
+
+      for (const thinking of thinkingParts) {
+        if (thinking.signature) {
+          contentBlocks.push({
+            type: 'thinking',
+            thinking: thinking.content,
+            signature: thinking.signature,
+          } as unknown as AnthropicContentBlock)
+        }
+      }
+    }
+
     for (const message of messages) {
       const role = message.role
@@
       if (role === 'assistant' && message.toolCalls?.length) {
         const contentBlocks: AnthropicContentBlocks = []
 
-        if (message.thinking?.length) {
-          for (const thinking of message.thinking) {
-            if (thinking.signature) {
-              contentBlocks.push({
-                type: 'thinking',
-                thinking: thinking.content,
-                signature: thinking.signature,
-              } as unknown as AnthropicContentBlock)
-            }
-          }
-        }
+        appendThinkingBlocks(contentBlocks, message.thinking)
@@
+      if (role === 'assistant' && message.thinking?.length) {
+        const contentBlocks: AnthropicContentBlocks = []
+        appendThinkingBlocks(contentBlocks, message.thinking)
+
+        if (message.content) {
+          contentBlocks.push({
+            type: 'text',
+            text: typeof message.content === 'string' ? message.content : '',
+          } as AnthropicContentBlock)
+        }
+
+        formattedMessages.push({
+          role: 'assistant',
+          content: contentBlocks,
+        })
+        continue
+      }
+
       formattedMessages.push({

Also applies to: 468-478

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-anthropic/src/adapters/text.ts` around lines 407 -
417, The code drops signed thinking for assistant messages that have no
toolCalls; ensure any message.thinking items with a signature are always
appended to contentBlocks as AnthropicContentBlock regardless of the
assistant/toolCalls branch. Update the logic around the handling of
message.thinking (the loop that pushes {type:'thinking', thinking:
thinking.content, signature: thinking.signature} into contentBlocks) so it runs
for both the assistant+toolCalls branch and the later assistant-only branch (the
code paths around the existing message.thinking loop and the block at lines
~468-478); you can either hoist the loop above the branch so it always executes
for message.thinking or add the same signed-thinking push into the
assistant-only branch, referencing message.thinking, contentBlocks, and
AnthropicContentBlock.

Comment on lines +1119 to +1136
// Consume pending stepId from STEP_STARTED that arrived before the message existed
if (this.pendingThinkingStepId) {
state.currentThinkingStepId = this.pendingThinkingStepId
if (!state.thinkingSteps.has(this.pendingThinkingStepId)) {
state.thinkingSteps.set(this.pendingThinkingStepId, '')
state.thinkingStepOrder.push(this.pendingThinkingStepId)
}
this.pendingThinkingStepId = null
}

const stepId = state.currentThinkingStepId ?? chunk.stepId

// Auto-initialize if no prior STEP_STARTED (backward compat)
if (!state.thinkingSteps.has(stepId)) {
state.thinkingSteps.set(stepId, '')
state.thinkingStepOrder.push(stepId)
state.currentThinkingStepId = stepId
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prefer the chunk’s explicit stepId to avoid merging distinct thinking steps.

Line 1129 ignores chunk.stepId whenever currentThinkingStepId is already set. If a replay/provider emits STEP_FINISHED events with distinct stepIds but without STEP_STARTED, later steps are accumulated into the previous ThinkingPart.

🐛 Proposed fix
-    const stepId = state.currentThinkingStepId ?? chunk.stepId
+    const stepId = chunk.stepId ?? state.currentThinkingStepId
 
     // Auto-initialize if no prior STEP_STARTED (backward compat)
     if (!state.thinkingSteps.has(stepId)) {
       state.thinkingSteps.set(stepId, '')
       state.thinkingStepOrder.push(stepId)
-      state.currentThinkingStepId = stepId
     }
+    state.currentThinkingStepId = stepId
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/activities/chat/stream/processor.ts` around lines
1119 - 1136, The code currently prefers state.currentThinkingStepId over
chunk.stepId which causes distinct chunk.stepId values to be merged into an
existing thinking step; change the selection to prefer the explicit chunk.stepId
when present: compute stepId = chunk.stepId ?? state.currentThinkingStepId, then
if the resolved stepId is not present in state.thinkingSteps, initialize it
(state.thinkingSteps.set(stepId, ''), state.thinkingStepOrder.push(stepId)) and
update state.currentThinkingStepId = stepId; keep the existing
pendingThinkingStepId consumption logic but ensure it does not override an
explicit chunk.stepId.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anthropic tool loops: thinking blocks missing on turn 2+

2 participants